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Abstract. Spirtes, Glymour and Scheines [Causation, Prediction, and 
Search (1993) Springer] described a pointwise consistent estimator of 
the Markov equivalence class of any causal structure that can be rep¬ 
resented by a directed acyclic graph for any parametric family with a 
uniformly consistent test of conditional independence, under the Causal 
Markov and Causal Faithfulness assumptions. Robins et al. [Biometrika 
90 (2003) 491-515], however, proved that there are no uniformly consis¬ 
tent estimators of Markov equivalence classes of causal structures un¬ 
der those assumptions. Subsequently, Kalisch and Biihlmann [J. Mach. 
Learn. Res. 8 (2007) 613-636] described a uniformly consistent estima¬ 
tor of the Markov equivalence class of a linear Gaussian causal struc¬ 
ture under the Causal Markov and Strong Causal Faithfulness assump¬ 
tions. However, the Strong Faithfulness assumption may be false with 
high probability in many domains. We describe a uniformly consis¬ 
tent estimator of both the Markov equivalence class of a linear Gaus¬ 
sian causal structure and the identifiable structural coefficients in the 
Markov equivalence class under the Causal Markov assumption and the 
considerably weaker A;-Triangle-Faithfulness assumption. 

Key words and phrases: Causal inference, uniform consistency, struc¬ 
tural equation models, Bayesian networks, model selection, model 
search, estimation. 


1. INTRODUCTION 

A principal aim of many sciences is to model 
causal systems well enough to provide sound insight 
into their structures and mechanisms and to pro- 
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vide reliable predictions about the effects of policy 
interventions. The modeling process is typically di¬ 
vided into two distinct phases: a model specification 
phase in which some model (with free parameters) is 
specified, and a parameter estimation and statisti¬ 
cal testing phase in which the free parameters of the 
specified model are estimated and various hypothe¬ 
ses are put to a statistical test. Both model speci¬ 
fication and parameter estimation can fruitfully be 
thought of as search problems. 

As pointed out in Robins et al. (2003), common 
statistical wisdom dictates that causal effects cannot 
be consistently estimated from observational stud¬ 
ies alone unless one observes and adjusts for all 
possible confounding variables, and knows the time 
order in which events occurred. However, Spirtes, 
Glymour and Scheines (1993) and Pearl (2000) de- 


1 




2 


P. SPIRTES AND J. ZHANG 


veloped a framework in which causal relationships 
are represented by edges in a directed acyclic graph. 
They also described asymptotically consistent pro¬ 
cedures for determining features of causal struc¬ 
ture from data even if we allow for the possibil¬ 
ity of unobserved confounding variables and/or an 
unknown time order, under two assumptions: the 
Causal Markov assumption (roughly, given no un¬ 
measured common causes, each variable is indepen¬ 
dent of its noneffects conditional on its direct causes) 
and the Causal Faithfulness assumption (all condi¬ 
tional independence relations that hold in the distri¬ 
bution are entailed by the Causal Markov assump¬ 
tion). Under these assumptions, the procedures they 
propose (e.g., the SGS and the PC algorithms as¬ 
suming no unmeasured common causes, and the FCI 
algorithm which does not assume no unmeasured 
common causes) can infer the existence or absence 
of causal relationships. In particular, Spirtes et al. 
(1993), Chapters 5 and 6, proved the Fisher con¬ 
sistency of these procedures. Pointwise consistency 
follows from the Fisher consistency and the uniform 
consistency of the test procedures for conditional in¬ 
dependence relationships in certain parametric fam¬ 
ilies that the procedures use. 

Robins et al. (2003) proved that under the Causal 
Markov and Faithfulness assumptions made in 
Spirtes, Glymour and Scheines (1993) there are 
no uniformly consistent procedures for estimating 
features of the causal structure from data, even 
when there are no unmeasured common causes. 
Spirtes, Glymour and Scheines (2000), Kalisch and 
Biihlmann (2007) and Colombo et al. (2012) in¬ 
troduced a Strong Causal Faithfulness assumption, 
which, roughly speaking, assumes that no condi¬ 
tional independence relation not entailed by the 
Causal Markov assumption “almost” holds. Kalisch 
and Biihlmann (2007) and Colombo et al. (2012) 
showed that under this strengthened Causal Faith¬ 
fulness assumption, some modifications of the point- 
wise consistent procedures developed in Spirtes, 
Glymour and Scheines (1993) are uniformly con¬ 
sistent. Maathuis et al. (2010) have also successfully 
applied these procedures to various biological data 
sets, experimentally confirming some of the causal 
inferences made by the procedures. 

However, the question remains whether the Strong 
Causal Faithfulness assumption made by Kalisch 
and Biihlmann (2007) is too strong. Is it likely to be 
true? Some analysis done by Uhler et al. (2013) indi¬ 
cates that the strengthened Causal Faithfulness as¬ 


sumption is likely to be false, especially when there 
are a large number of variables. 

In this paper we investigate a number of different 
ways in which the strengthened Causal Faithfulness 
assumption can be weakened, while still retaining 
the guarantees of uniformly consistent estimation 
by modifying the causal estimation procedures. It 
is not clear whether the ways we propose to weaken 
the Strong Causal Faithfulness assumption make it 
substantially more likely to hold, nor is it clear that 
all of the modifications that we propose to the es¬ 
timation procedures make them substantially more 
accurate in practice. Nevertheless, we believe that 
the modifications that we propose are a useful first 
step toward investigating fruitful modifications of 
the Causal Faithfulness assumption and causal esti¬ 
mation procedures. 

In Section 2 we describe the basic setup and as¬ 
sumptions for causal inference. In Section 3 we ex¬ 
amine various ways to weaken the Causal Faithful¬ 
ness assumption and modifications of the estimation 
procedures that preserve pointwise consistency. In 
Section 4 we examine weakening the Strong Causal 
Faithfulness assumption and modification of the es¬ 
timation procedures that preserves uniform consis¬ 
tency. Finally, in Section 5 we summarize the results 
and describe areas of future research. 

2. THE BASIC ASSUMPTIONS FOR CAUSAL 
INFERENCE 

We first introduce the graph terminology that 
we will use. Individual variables are denoted with 
italicized capital letters, and sets of variables are 
denoted with bold-faced capital letters. A graph 
G = (V, E) consists of a set of vertices V and a set 
of edges E C V x V, where for each {X, K) G E, X / 
Y. If {X,Y) G E and (K, X) G E, there is an undi¬ 
rected edge between X and Y, denoted by X — Y. If 
(X, y) G E and (K, X) ^ E, there is a directed edge 
between X and Y, denoted by X —)• K. If there is a 
directed edge from X to Y, or from Y to X, or there 
is an undirected edge between X and Y, then X and 
y are adjacent in G. Adj(G,X) is the set of vertices 
adjacent to X. If all of the edges in a graph G are 
directed edges, then G is a directed graph. A path 
between Xi and X„ in G is an ordered sequence of 
vertices (Xi,... ,X„) such that for 1 < f < n, Xj_i 
and Xj are adjacent in G. A path between Xi and 
Xn in G is a directed path if for 1 < z < n, the edge 
between Xj_i and X* is a directed edge from Xj_i 
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to Xi. A path is acyclic if no vertex occurs on the 
path twice. A directed graph is acyclic (DAG) if all 
directed paths are acyclic. A is a parent of Y and Y 
is a child of X if there is an edge A —)• y. (A, Y, Z) 
is a triangle in G if A is adjacent to Y and Z, and 
y is adjacent to Z. 

Suppose G is a graph. Parents(G, A) is the set of 
parents of A in G. A is an ancestor of Y (and Y is 
a descendant of A) if there is a directed path from A 
to y. A subset of V is ancestral, if it is closed under 
the ancestor relation. A triple of vertices (A, Y, Z) is 
unshielded if A is adjacent to Y and Y is adjacent 
to Z, but A is not adjacent to Z. A triple of vertices 
(A, y, Z) is a collider if there are edges A —)• y -(—A. 
A triple of vertices (A, Y, Z) is a noncollider if A is 
adjacent to Y and Y is adjacent to Z, but it is not 
a collider. 

A probability distribution P over a set of vari¬ 
ables V satisfies the {local directed) Markov condi¬ 
tion for a DAG G if and only if each variable V 
in V is independent of the set of variables that are 
neither parents nor descendants of V in G, condi¬ 
tional on the parents of U in G. A Bayesian network 
is an ordered pair {P, G) where P satisfies the lo¬ 
cal directed Markov condition for G. If M = (P, G), 
Pm denotes P and Gm denotes G. Two DAGs Gi 
and G 2 over the same set of variables V are said 
to be Markov equivalent if all of the conditional in¬ 
dependence relations entailed by satisfying the lo¬ 
cal directed Markov condition for Gi are also en¬ 
tailed by satisfying the local directed Markov con¬ 
dition for G 2 , and vice versa. A useful characteriza¬ 
tion of Markov equivalence between DAGs is that 
two DAGs are Markov equivalent if and only if they 
have the same adjacencies and the same unshielded 
colliders (Verma and Pearl, 1990). A Markov equiv¬ 
alence class M IS a set of DAGs that contains all 
DAGs that are Markov equivalent to each other. A 
Markov equivalence class M can be represented by a 
graph called a pattern; a pattern O is a graph such 
that (i) if A —)• y in every DAG in M, then A —)• y 
in O; and (ii) if A —)• y in some DAG in M and 
y —A in some other DAG in M, then A — y in 
O. In that case O is said to represent M and each 
DAG in M. 

If X is independent of Y conditional on Z, we write 
/(X, Y|Z), or if A, Y, and Z are individual variables 
I{X,Y\Z). In a DAG G, a vertex A is active on an 
acyclic path U between A and Y conditional on set 
Z of vertices (not containing A or y) if A = A or 
A = y, or A is a noncollider on U and not in Z, or 


A is a collider on U that is in Z or has a descen¬ 
dant in Z. An acyclic path U is active conditional 
on a set Z of vertices if every vertex on the path is 
active relative to Z. If A 7 ^ y and Z does not con¬ 
tain A or y, A is d-separated from Y conditional 
on Z if there is no active acyclic path between A 
and y conditional on Z; otherwise A and Y are d- 
connected conditional on Z. For three disjoint sets X, 
Y and Z, X is d-separated from Y conditional on Z if 
there is no acyclic active path between any member 
of X and any member of Y conditional on Z; other¬ 
wise X and Y are d-connected conditional on Z. If 
X is d-separated from Y conditional on Z in DAG 
G, then /(X, Y|Z) in every probability distribution 
that satisfies the local directed Markov condition for 
G (Pearl, 1988). Any conditional independence re¬ 
lation that holds in every distribution that satisfies 
the local directed Markov condition for DAG G is 
entailed by G. Note, however, that in some distribu¬ 
tions that satisfy the local directed Markov condi¬ 
tion for G, some conditional independence relation 
/(X,Y|Z) may hold even if X is not d-separated 
from Y conditional on Z in G; such distributions 
are said to be unfaithful to G. 

There are a number of different parameterizations 
of a DAG G, which map G onto distributions that 
satisfy the local directed Markov condition for G. 
One common parameterization is a recursive linear 
Gaussian structural equation model. A recursive lin¬ 
ear Gaussian structural equation model is an or¬ 
dered triple (G, Pg,S), where G is a DAG over a 
set of vertices Ai,..., A„, Eq is a set of equations, 
one for each Aj such that 

Aj — ^ ^ T 

Xj EParents(G,Xi) 

where the bj^i are real constants known as the struc¬ 
tural coefficients, and the £i are multivariate Gaus¬ 
sian that are jointly independent of each other with 
covariance matrix S. The £i are referred to as “error 
terms.” In vector notation, where X is the vector of 
Ai,..., Xn, B is the matrix of structural coefficients, 
and £ is the vector of error terms, 

X = BX + £. 

The covariance matrix S over the error terms, to¬ 
gether with the structural equations, determine a 
distribution over the variables in X, which sat¬ 
isfies the local directed Markov condition for G. 
Hence, the DAG in a recursive linear Gaussian struc¬ 
tural equation model M together with the proba¬ 
bility distribution generated by the equations and 
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the covariance matrix over the error terms form 
a Bayesian network. Because the joint distribution 
over the nonerror terms of a linear Gaussian struc¬ 
tural equation model is multivariate Gaussian, X 
is independent of Y conditional on Z in Pm if 
and only if pm{X,Y\Z) = 0, where pm{X,Y\Z) de¬ 
notes the conditional or partial correlation between 
X and Y conditional on Z according to Pm- Let 
eM{X —>• Z) denote the structural coefficient of the 
X ^ Z edge in Gm- If there is no edge X ^ Z 
in Gm, then cm^X —)• Z) = 0. If Z and Z are ad¬ 
jacent in Gm, then eM{X — Z) = eM{X —>• Z) if 
there is an X —)• Z edge in Gm, and otherwise 
— Z) = eM{Z —>• X). 

There is a causal interpretation of recursive linear 
Gaussian structural equation models, in which set¬ 
ting (as in an experiment, as opposed to observing) 
the value of Xj to the fixed value x is represented 
by replacing the structural equation for Xj with the 
equation Xi = x. Under the causal interpretation, a 
recursive linear structural equation model is a causal 
model, the DAG Gm is a causal DAG, and the pat¬ 
tern that represents Gm is a causal pattern. A causal 
model with a set of variables V is causally sufficient 
when every common direct cause of any two vari¬ 
ables in V is also in V. Informally, under a causal 
interpretation, an edge X —)• X in Gm represents 
that X is a direct cause of Y relative to V. A causal 
model of a population is true when the model cor¬ 
rectly predicts the results of all possible settings of 
any subset of the variables (Pearl, 2000). 

There are two assumptions made about the re¬ 
lationship between the causal DAG and the popu¬ 
lation probability distribution that play a key role 
in causal inference from observational data. A dis¬ 
cussion of the implications of these assumptions, ar¬ 
guments for them, and a discussion of conditions 
when they should not be assumed are given in 
Spirtes, Glymour and Scheines (1993), pages 32-42. 
In this paper, we will consider only those cases where 
the causal relations in a given population can be rep¬ 
resented by a model whose graph is a DAG. 

Causal Markov assumption [CMA). If the true 
causal model Af of a population is causally sufficient, 
every variable in V is independent of the variables 
that are neither its parents nor descendants in Gm 
conditional on its parents in Gm- 

Causal Faithfulness assumption {CPA). Every 
conditional independence relation that holds in the 
population probability distribution is entailed by 
the true causal DAG of the population. 


The Causal Markov and Causal Faithfulness as¬ 
sumptions together entail that X is independent of 
Y conditional on Z in the population if and only if 
X is d-separated from Y conditional on Z in the true 
causal graph. 

3. WEAKENING THE CAUSAL 
FAITHFULNESS ASSUMPTION 

A number of algorithms for causal estimation 
have been proposed that rely on the assump¬ 
tion of the causal sufficiency of the observed vari¬ 
ables, the Causal Markov assumption and the 
Causal Faithfulness assumption. The SCS algo¬ 
rithm (Spirtes, Glymour and Scheines, 1993, pa¬ 
ge 82), for example, is a Fisher consistent estimator 
of causal patterns under these assumptions. (This, 
together with a uniformly consistent test of con¬ 
ditional independence, entails that the SCS algo¬ 
rithm is a pointwise consistent estimator of causal 
patterns.) 

In this section we explore ways to weaken the 
Causal Faithfulness assumption that still allow 
pointwise consistent estimation of (features of) 
causal structure, and we illustrate the ideas by going 
through a sequence of generalizations of the popu¬ 
lation version of the SCS algorithm. None of the 
results in this section depend upon assuming Gaus- 
sianity or linearity. The basic idea is that although 
the Causal Faithfulness assumption is not fully 
testable (without knowing the true causal struc¬ 
ture), it has testable components given the Causal 
Markov assumption. Under the Causal Markov as¬ 
sumption, the Cansal Faithfulness assumption en¬ 
tails that the probability distribution admits a per¬ 
fect DAG representation, that is, a DAG that en¬ 
tails all and only those conditional independence 
relations true of the distribution. Whether there 
is such a DAG depends only on the distribution, 
and so is, in theory, testable. In principle, then, one 
may adopt a weaker-than-faithfulness assumption 
and test (rather than assnme) the testable part of 
the faithfulness condition. 

The SCS algorithm takes an oracle of conditional 
independence as input, and outputs a graph on the 
given set of variables with both directed edges and 
undirected edges. 

SCS algorithm. 

SI. Form the complete undirected graph H on the 
given set of variables V. 
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52. For each pair of variables X and V in V, search 
for a subset S of V \ {X, Y} such that X and Y 
are independent conditional on S. Remove the 
edge between X and F in R if and only if such 
a set is found. 

53. Let K be the graph resulting from S2. For each 
unshielded triple {X,Y,Z) (i.e., X and Y are 
adjacent, Y and Z are adjacent, but X and Z 
are not adjacent), 

(i) If X and Z are not independent conditional 
on any subset of V \ {X,Z} that contains 
Y, then orient the triple as a collider: X —^ 
Y^Z. 

(ii) If X and Z are not independent conditional 
on any subset of V \ {X, Z} that does not 
contain Y, then mark the triple as a non¬ 
collider (i.e., not X ^Y -fr- Z). 

54. Execute the following orientation rules until 
none of them applies: 

(i) If X —)• y — Z, and the triple {X, Y, Z) 
is marked as a noncollider, then orient 
Y — Z asY ^ Z. 

(ii) If X —)• y —)• Z and X — Z, then orient 
X — Z as X ^ Z. 

(hi) If X —)• y •(— Z, another triple {X, W, Z) is 
marked as a noncollider, and W — Y, then 
orient W — Y as VF —>• F. (This rule was 
not in the original SGS or PC algorithm, 
but added by Meek, 1995.) 

Assuming the oracle of conditional independence 
is perfectly reliable (which we will do throughout 
this section), the SGS algorithm is correct under the 
Causal Markov and Faithfulness assumptions, in the 
sense that its output is the pattern that represents 
the Markov equivalence class containing the true 
causal DAG (Spirtes, Glymour and Scheines, 1993, 
page 82; Meek, 1995). 

The correctness of SGS follows from the following 
three properties of d-separation (Spirtes, Glymour 
and Scheines (1993)): 

1. A is adjacent to F in DAG G if and only if X 
is not d-separated from F conditional on any subset 
of the other variables in G. 

2. If {X,Y,Z) is an unshielded collider in DAG 
G, then X is not d-separated from Z conditional on 
any subset of the other variables in G that contains 
F. 

3. If {X, F, Z) is an unshielded noncollider in 
DAG G, then X is not d-separated from Z condi¬ 
tional on any subset of the other variables in G that 
does not contain F. 


We shall not reproduce the full proof here, but 
a few points are worth stressing. First, S2 is the 
step of inferring adjacencies and nonadjacencies. 
The inferred adjacencies, represented by the remain¬ 
ing edges in the graph resulting from S2, are correct 
because of the Gausal Markov assumption alone: ev¬ 
ery DAG Markov to the given oracle must contain at 
least these adjacencies. On the other hand, the in¬ 
ferred nonadjacencies (via removal of edges) are cor¬ 
rect because of the Causal Faithfulness assumption, 
or, more precisely, because of the following conse¬ 
quence of the Causal Faithfulness assumption, which 
we, following Ramsey, Zhang and Spirtes (2006), will 
refer to as Adjacency-Faithfulness. 

Adjacency-Faithfulness assumption. Given a set of 
variables V whose true causal DAG is G, if two vari¬ 
ables X, Y are adjacent in G, then they are not in¬ 
dependent conditional on any subset of V \ {A, F}. 

Under the Adjacency-Faithfulness assumption, 
any edge removed in S2 is correctly removed, be¬ 
cause any DAG with the adjacency violates the 
Adj acency-Faithfulness assumption. 

Second, the key step of inferring orientations is 
step S3, in which unshielded colliders and noncol¬ 
liders are inferred. Given that the adjacencies and 
nonadjacencies are all correct, the clauses (i) and 
(ii) in step S3, as formulated here, are justified by 
the Causal Markov assumption alone. Take clause 
(i), for example. If the unshielded triple (A, F, Z) 
is not a collider in the true causal DAG, then the 
Causal Markov assumption entails that A and Z are 
independent conditional on some set that contains 
F. That is why clause (i) is sound. A similar ar¬ 
gument shows that clause (ii) is sound. This does 
not mean, however, that the Causal Faithfulness as¬ 
sumption does not play any role in justifying S3. 
Notice that the antecedent of (i) and that of (ii) do 
not exhaust the logical possibilities. They leave out 
the possibility that A and Z are independent con¬ 
ditional on some set that contains F and indepen¬ 
dent conditional on some set that does not contain 
F. This omission is justified by the Causal Faithful¬ 
ness assumption, or, more precisely, by the following 
consequence of the Causal Faithfulness assumption 
(Ramsey, Zhang and Spirtes (2006)): 

Orientation-Faithfulness assumption. Given a set 
of variables V whose true causal DAG is G, let 
{X,Y,Z) be any unshielded triple in G: 

1. If A —)• F •(— Z, then A and Z are not indepen¬ 
dent conditional on any subset of V \ {A, Z} that 
contains F; 
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2. Otherwise, X and Z are not independent con¬ 
ditional on any subset of V \ {X, Z} that does not 
contain Y. 

Obviously, the possibility left out by S3 is indeed 
ruled out by the Orientation-Faithfulness assump¬ 
tion. 

The Orientation-Faithfulness assumption, if true, 
justifies a much simpler and more efficient step than 
S3: for every unshielded triple {X,Y,Z), we need 
check only the set found in S2 that renders X and Z 
independent; the triple is a collider if and only if the 
set does not contain Y. This simplification is used 
in the PC algorithm, a well-known, more compu¬ 
tationally efficient rendition of the SGS procedure 
(Spirtes, Glymour and Schemes, 1993, pages 84- 
85). Moreover, the Adjacency-Faithfulness condition 
also justifies a couple of measures to improve the 
efficiency of S2, used by the PC algorithm. Here 
we are concerned with showing how the basic SGS 
procedure may be modified to be correct under in¬ 
creasingly weaker assumptions of faithfulness, so we 
will not go into the details of the optimization mea¬ 
sures in the PC algorithm. Whether these or similar 
measures are available to the modified algorithms 
we introduce below is an important question to be 
addressed in future work. 

Let us start with the modification proposed by 
Ramsey, Zhang and Spirtes (2006), who observed 
that assuming the Causal Markov and Adjacency- 
Faithfulness assumptions are true, any failure of the 
Orientation-Faithfulness assumption is detectable, 
in the sense that the probability distribution in 
question is not both Markov and Faithful to any 
DAG (Zhang and Spirtes, 2008). In our formulation 
of the SGS algorithm, it is easy to see how fail¬ 
ures of Orientation-Faithfulness can be detected. 
As already mentioned, the role of the Orientation- 
Faithfulness assumption in justifying the SGS al¬ 
gorithm is to guarantee that at the step S3, ei¬ 
ther the antecedent of (i) or that of (ii) will ob¬ 
tain. Therefore, if it turns out that for some un¬ 
shielded triple neither antecedent is satisfied, the 
Orientation-Faithfulness assumption is detected to 
be false for that triple. 

This suggests a simple modification to S3 in the 
SGS algorithm. 

S3*. Let K be the undirected graph resulting from 
S2. For each unshielded triple {X,Y,Z), 

(i) If X and Z are not independent conditional 
on any subset of V \ {X, Z} that contains Y, then 
orient the triple as a collider: X —)• T •(— Z. 


(ii) If X and Z are not independent conditional 
on any subset of V \ {X,Z} that does not contain 
Y, then mark the triple as a noncollider. 

(hi) Otherwise, mark the triple as ambiguous (or 
unfaithful). 

Ramsey, Zhang and Spirtes (2006) applied essen¬ 
tially this modification to the PC algorithm and 
called the resulting algorithm the Conservative PC 
(CPC) algorithm. (Their results show that the main 
optimization measures used in the PC algorithm 
still apply to this generalization of SGS because the 
Adjacency-Faithfulness condition is still assumed.) 
We will thus call the algorithm that results from re¬ 
placing S3 with S3* the Conservative SGS (CSGS) 
algorithm. 

It is straightforward to prove that the CSGS al¬ 
gorithm is correct under the Gausal Markov and the 
Adjacency-Faithfulness assumptions alone, in the 
sense that if the Causal Markov and Adjacency- 
Faithfulness assumptions are true and if the or¬ 
acle of conditional independence is perfectly reli¬ 
able, then every adjacency, nonadjacency, orienta¬ 
tion and marked noncollider in the output of the 
CSGS are correct. As pointed out in Ramsey, Zhang 
and Spirtes (2006), the output of the CSGS can be 
understood as an extended pattern that represents a 
set of patterns. For example, a sample output used 
in Ramsey, Zhang and Spirtes (2006) is given in Fig¬ 
ure 1(a). There are two ambiguous unshielded triples 
in the output: {Y,X,Z) and {Z,U,Y), which are 
marked by crossing straight lines. Note that there 
is no explicit mark for noncolliders, with the under¬ 
standing that all and only unshielded triples that are 
not oriented as colliders or marked as ambiguous are 
(implicitly) marked noncolliders. Figure 1(a) repre¬ 
sents a set of three patterns, depicted in Figure 1(b)- 
(d). Each pattern results from some disambiguation 
of the ambiguous triples in Figure 1(a). The pat¬ 
tern in Figure 1(b), for example, results from tak¬ 
ing the triple (Y, X, Z) as a noncollider and taking 
the triple {Z, U, Y) as a collider. Note that not ev¬ 
ery disambiguation results in a pattern. Taking both 
ambiguous triples as noncolliders would force a di¬ 
rected cycle: Z—)■[/—)-X—>-Z, and so would 
not lead to a pattern. That is why there are only 
three instead of four patterns in the set represented 
by Figure 1(a). 

It is easy to see that when the Orientation- 
Faithfulness assumption happens to hold, the CSGS 
output will be a single pattern (i.e., without ambigu¬ 
ous triples), which is the same as the SGS output. In 
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Fig. 1. (a) is a sample output of the CSGS algorithm. The 

ambiguous (or unfaithful) unshielded triples are marked by 
straight lines crossing the two edges. There is no explicit mark 
for noncolliders, with the understanding that all and only un¬ 
shielded triples that are not oriented as colliders or marked as 
ambiguous are (implicitly) marked noncolliders, (b)-(d) are 
the three patterns represented by (a). 

other words, CSGS is as informative as SGS when 
the stronger assumption needed for the output of 
the latter to be guaranteed to be correct happens to 
be true. 

The Adjacency-Faithfulness assumption may be 
further weakened. In an earlier paper (Zhang and 
Spirtes, 2008), we showed that some violations of 
the Adjacency-Faithfulness assumption are also de¬ 
tectable, and we specified some conditions weaker 
than the Adjacency-Faithfulness assumption under 
which any violation of Faithfulness (and so any vio¬ 
lation of Adjacency-Faithfulness) is detectable. One 
of the weaker conditions is known as the Causal Min¬ 
imality assumption (Spirtes, Glymour and Scheines, 
1993, page 31), which states that the true causal 
DAG is a minimal DAG that satisfies the Markov 
condition with the true probability distribution, 
minimal in the sense that no proper subgraph satis¬ 
fies the Markov condition. This condition is a conse¬ 
quence of the Adjacency-Faithfulness assumption. If 
the Adjacency-Faithfulness assumption is true, then 
no edge can be taken away from the true causal DAG 
without violating the Markov condition. 

The other weaker condition is named Triangle- 
Faithfulness: 

Triangle-Faithfulness assumption. Suppose the 
true causal DAG of V is G. Let X, Y, Z be any 
three variables that form a triangle in G (i.e., each 
pair of vertices is adjacent): 


1. If F is a noncollider on the path {X, Y, Z), then 
X and Z are not independent conditional on any 
subset of V \ {X, Z} that does not contain Y ; 

2. If F is a collider on the path {X,Y,Z), then 
X and Z are not independent conditional on any 
subset of V \ {A, Z} that contains F. 

Glearly, the Adjacency-Faithfulness assumption 
entails the Triangle-Faithfulness assumption, and 
the latter, intuitively, is much weaker. Our result in 
Zhang and Spirtes (2008) is that given the Causal 
Markov, Minimality and Triangle-Faithfulness as¬ 
sumptions, any violation of faithfulness is detectable. 
But we did not propose any algorithm that is prov- 
ably correct under the Markov, Minimality and 
Triangle-Faithfulness assumptions. 

What need we modify in the SGS algorithm if 
all we can assume are the Markov, Minimality and 
Triangle-Faithfulness assumptions? In the step S2, 
the inferred adjacencies are still correct, which, as 
already mentioned, is guaranteed by the Causal 
Markov assumption alone. The inferred nonadjacen¬ 
cies, however, are not necessarily correct, because 
the Adjacency-Faithfulness assumption might fail. 
So the first modification we need make is to acknowl¬ 
edge that the nonadjacencies resulting from S2 are 
only “apparent” but not “definite”: there might still 
be an edge between two variables even though the 
edge between them was removed in S2 because a 
screen-off set was found. 

Since we do not assume the Orientation-Faithful¬ 
ness assumption, obviously we need at least modify 
S3 into S3*. A further worry is that the unshielded 
triples resulting from S2 are only “apparent”: they 
might be shielded in the true causal DAG but ap¬ 
pear to be unshielded due to a failure of Adjacency- 
Faithfulness. Fortunately, this possibility does not 
affect the soundness of S3*. Take clause (i) for exam¬ 
ple. For an apparently unshielded triple (A, F, Z), 
either A and Z are really nonadjacent in the true 
DAG or they are adjacent. In the former case, clause 
(i) is sound by the Markov assumption. In the lat¬ 
ter case, clause (i) is still sound by the Triangle- 
Faithfulness assumption. A similar argument shows 
that clause (ii) is also sound. So S3* is still sound. 
Moreover, clause (hi) can now play a bigger role than 
simply conceding ignorance or ambiguity. If the an¬ 
tecedent of clause (hi) is satisfied, then one can infer 
that A and Z are really nonadjacent, for otherwise 
the Triangle-Faithfulness assumption would be vi¬ 
olated no matter whether (A, F, Z) is a collider or 
not. 
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The soundness of S4 is obviously not affected. 
Therefore, if we only assume the Causal Markov, 
Minimality and Triangle-Faithfulness assumptions, 
the CSGS algorithm is still correct if we take the 
nonadjacencies in its output as uninformative (ex¬ 
cept for those warranted by S3*). 

The question now is whether we can somehow 
test the Adjacency-Faithfulness assumption in the 
procedure and confirm the nonadjacencies when the 
test returns affirmative. The following lemma gives 
a sufficient condition for verifying the Adjacency- 
Faithfulness assumption and hence the nonadjacen¬ 
cies in the CSGS output. (Recall that the CSGS 
output in general represents a set of patterns, and 
each pattern represents a set of Markov equivalent 
DAGs.) A pattern O is Markov to an oracle when 
for every DAG represented by O, each vertex is in¬ 
dependent of the set of variables that are neither 
descendants nor parents in the DAG conditional on 
the parents in the DAG according to the oracle. 

Lemma 1. Suppose the Causal Markov, Min¬ 
imality and Triangle-Faithfulness assumptions are 
true, and E is the output of CSGS given a perfectly 
reliable oracle of conditional independence. If every 
pattern in the set represented by E is Markov to the 
oracle, then the true causal DAG has exactly those 
adjacencies present in E. 

Proof. As we already pointed out, the true 
causal DAG, Gt, must have at least the adjacencies 
in E (in order to satisfy the Gausal Markov assump¬ 
tion), and must have the colliders and noncolliders 
in E (in order to satisfy the Gausal Markov and 
the Triangle-Faithfulness assumptions). Now sup¬ 
pose every pattern in the set represented by E is 
Markov to the oracle, and suppose, for the sake of 
contradiction, that Gt has still more adjacencies. 
Let G be the proper subgraph of Gt with just the 
adjacencies in E. Then every unshielded collider and 
every unshielded noncollider in E are also present 
in G, and other unshielded triples in G, if any, are 
ambiguous in E. Thus, the pattern that represents 
the Markov equivalence class of G is in the set rep¬ 
resented by E. It follows that G is Markov to the 
oracle, which shows that Gt is not a minimal graph 
that is Markov to the oracle. This contradicts the 
Gausal Minimality assumption. Therefore, Gt has 
exactly the adjacencies present in E. □ 

So we have the following Very Conservative SGS 
(VCSGS): 

VCSGS algorithm. 


VI. Form the complete undirected graph H on the 
given set of variables V. 

V2. For each pair of variables X and Y in V, search 
for a subset S of V \ {X, Y } such that X and 
Y are independent conditional on S. Remove 
the edge between X and Y in H and mark the 
pair {X, Y) as “apparently nonadjacent,” if and 
only if such a set is found. 

V3. Let K be the graph resulting from V2. For each 
apparently unshielded triple {X,Y,Z) (i.e., X 
and Y are adjacent, Y and Z are adjacent, but 
X and Z are apparently nonadjacent), 

(i) If X and Z are not independent condi¬ 
tional on any subset of V \ {A, Z} that 
contains Y, then orient the triple as a col¬ 
lider: X^Y^Z. 

(ii) If X and Z are not independent condi¬ 
tional on any subset of V \ {A, Z} that 
does not contain Y, then mark the triple 
as a noncollider. 

(iii) Otherwise, mark the triple as ambiguous 
(or unfaithful), and mark the pair (A, Z) 
as “definitely nonadjacent.” 

V4. Execute the same orientation rules as in S4, un¬ 
til none of them applies. 

V5. Let M be the graph resulting from V4. For 
each consistent disambiguation of the ambigu¬ 
ous triples in M (i.e., each disambiguation that 
leads to a pattern), test whether the resulting 
pattern satisfies the Markov condition. If ev¬ 
ery pattern does, then mark all the “apparently 
nonadjacent” pairs as “definitely nonadjacent.” 

[An obvious way to test the Markov condition in 
V5 on a given pattern is to extend the pattern to 
a DAG and test the local Markov condition. That 
is, we need to test, for each variable A, whether A 
is independent of the variables that are neither its 
descendants nor its parents conditional on its par¬ 
ents. In linear Gaussian models, this can be done 
by regressing A on its nondescendants and testing 
whether the regression coefficients are zero for its 
nonparents. More generally, assuming composition, 
we need only run a conditional independence test 
for each nonadjacent pair, and, thus, in the worst 
case the number of conditional independence tests is 
O(n^), where n is the number of vertices. The num¬ 
ber of patterns to be tested in V5 is 0(2“), where a 
is the number of ambiguous unshielded triples.] 

As we already explained, steps V1“V4 are sound 
under the Gausal Markov, Minimality and Triangle- 
Faithfulness assumptions. Lemma 1 shows that V5 is 
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also sound. Hence, the VCSGS algorithm is correct 
under the Causal Markov, Minimality and Triangle- 
Faithfulness assumptions, in the sense that given a 
perfectly reliable oracle of conditional independence, 
all the adjacencies, definite nonadjacencies, directed 
edges and marked noncolliders are correct. More¬ 
over, when the Causal Faithfulness assumption hap¬ 
pens to hold, the CSGS output will be a single pat¬ 
tern and this single pattern will satisfy the Markov 
condition; hence, the VGSGS algorithm will return 
a single pattern with full information about nonad¬ 
jacencies. Therefore, VGSGS is also as informative 
as SGS when the Causal Faithfulness assumption 
happens to be true. 

One might think (or hope) that the VGSGS al¬ 
gorithm is as informative as the GSGS algorithm 
when Adjacency-Faithfulness (but not Orientation- 
Faithfulness) happens to hold. Unfortunately this is 
not true in general because the sufficient condition 
given in Lemma 1 (and checked in V5) is not neces¬ 
sary for the Adjacency-Faithfulness assumption. 

To illustrate, consider the following example. Sup¬ 
pose the true causal DAG is the one given in Fig¬ 
ure 2(a). Suppose the causal Markov assumption 
and the Adjacency-Faithfulness assumption are sat¬ 
isfied. And suppose that, besides the conditional 
independence relations entailed by the graph, the 
true distribution features one and only one extra 
conditional independence: I{X,Z\Y), due, for ex¬ 
ample, to some sort of balancing-out of the path 
{X,Y,Z) (active conditional on {T}) and the path 



X Z X z 



(a) (b) 



X Z X Z X z 

\ / 

WWW 


(c) (d) (e) 

Fig. 2. An example in which the test in step V5 of VCSGS 
does not confirm the nonadjacencies even though the nonad¬ 
jacencies are correct. 


{X,W,Z) (active conditional on {T}). This vio¬ 
lates the Orientation-Faithfulness assumption. The 
GSGS output will thus be the graph in Figure 2(b), 
in which both the triple (A, U, Z) and the triple 
(A, PU, Z) are ambiguous. This output represents a 
set of three patterns, as shown in Figure 2(c)-(e). 
(Again, the two ambiguous triples cannot be non- 
colliders at the same time.) However, only the pat¬ 
terns in Figure 2(c) and 2(d) satisfy the Markov 
condition. The pattern in Figure 2(e) violates the 
Markov condition because it entails that /(A, A|0), 
which is not true. 

For this example, then, the VGSGS will not return 
the full information of nonadjacencies, even though 
the Adjacency-Faithfulness assumption is true. 

In light of this example, it is natural to consider 
the following variant of step V5 in VGSGS: 

V5*. Let M be the graph resulting from V4. If 
some disambiguation of the ambiguous triples in M 
leads to a pattern that satisfies the Markov condi¬ 
tion, then mark all remaining “apparently nonadja- 
cent” pairs as “definitely nonadjacent.” 

We suspect that V5* is also sound under the 
Causal Markov, Minimality and Triangle-Faithfulness 
assumptions, but we have not found a proof. In other 
words, we conjecture that the sufficient condition 
presented in Lemma 1 can be weakened to that some 
pattern in the set represented by the GSGS out¬ 
put satisfies the Markov condition. (This conjecture 
is a consequence of the following plausible conjec¬ 
ture: Suppose a DAG G and a probability distribu¬ 
tion P satisfy the Markov, Minimality and Triangle- 
Faithfulness conditions. Then no DAG with strictly 
fewer adjacencies than in G is Markov to P. We 
thank an anonymous referee for making the point 
and the conjecture.) Note that if the Adjacency- 
Faithfulness assumption happens to hold, then at 
least one pattern (i.e., the pattern representing the 
true causal DAG) satisfies the Markov condition. 
Therefore, if our conjecture is true, we can replace 
V5 with V5* in the VGSGS algorithm, and the con¬ 
dition tested in V5* is both sufficient and necessary 
for Adjacency-Faithfulness. The resulting algorithm 
will then be as informative as the GSGS algorithm 
whenever the Adjacency-Faithfulness assumption 
happens to hold, and as informative as the SGS al¬ 
gorithm whenever both the Adjacency-Faithfulness 
assumption and the Orientation-Faithfulness as¬ 
sumption happen to hold. 

It is worth noting that if we adopt a natural, inter¬ 
ventionist conception of causation (e.g.. Woodward 
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(2003)), the Causal Minimality assumption is guar¬ 
anteed to be true if the probability distribution is 
positive (Zhang and Spirtes, 2011). Since positivity 
is a property of the probability distribution alone, 
we may also try to incorporate a test of positivity 
at the beginning of VCSGS, and proceed only if the 
test returns affirmative. We then need not assume 
the Causal Minimality assumption in order to jus¬ 
tify the procedure. 

4. WEAKENING THE STRONG CAUSAL 
FAITHFULNESS ASSUMPTION 

In this section we consider sample versions of the 
CSGS and VGSGS algorithms, assuming Gaussian- 
ity and linearity, and prove some positive results 
on uniform consistency, under a generalization and 
strengthening of the Triangle-Faithfulness assump¬ 
tion, which we call the /c-Triangle-Faithfulness as¬ 
sumption. 

If a model M does not satisfy the Causal Faith¬ 
fulness assumption, then M contains a zero partial 
correlation pm{X,Y\W) even though the Causal 
Markov assumption does not entail that pm{X, T |W) 
is zero. If pm{X^Y\W) = 0 but is not entailed to be 
zero for all values of the parameters, the parameters 
of the model satisfy an algebraic constraint. A set 
of parameters that satisfies such an algebraic con¬ 
straint is a “surface of unfaithfulness” in the param¬ 
eter space that is of a lower dimension than the full 
parameter space. Lying on such a surface of unfaith¬ 
fulness is of Lebesgue measure zero. For a Bayesian 
with a prior probability over the parameter space 
that is absolutely continuous with Lebesgue mea¬ 
sure, the prior probability of unfaithfulness is zero. 

However, in practice, the SGS (or PG) algo¬ 
rithm does not have access to the population cor¬ 
relation coefficients. Instead it performs statistical 
tests of whether a partial correlation is zero. If 
\pm{X,Y\'W)\ is small enough, then with high prob¬ 
ability a statistical test of whether / 3 m(^, H|W) 
equals zero will not reject the null hypothesis. If 
Pm{X,Y\W) = 0 fails to be rejected, this can lead 
to some edges that occur in the true causal DAG 
not appearing in the output of SGS and to errors in 
the orientation of edges in the output of SGS. (Such 
errors can also lead to the output of the SGS algo¬ 
rithm to fail to be a pattern, either becanse it con¬ 
tains double-headed edges or undirected nonchordal 
cycles.) Robins et al. (2003) showed that even if it is 
assumed that there are no unfaithful models, there 


are always models so “close to unfaithful” [i.e., with 
\pm{X,Y\'W)\ nonzero but small enough that a sta¬ 
tistical test will probably fail to reject the null hy¬ 
pothesis] that there is no algorithm that is a uni¬ 
formly consistent estimator of the pattern of a causal 
model. 

Kalisch and Biihlmann (2007) showed that under 
a strengthened version of the Causal Faithfulness 
assumption, the PG algorithm is a uniformly con¬ 
sistent estimator of the pattern that represents the 
true causal DAG. Let n be the sample size. Their 
strengthened set of assumptions were as follows: 

(AI) The distribution P^ is multivariate Gaus¬ 
sian and faithful to the DAG Gn for all n. 

(A 2 ) The dimension pn = 0{rf‘) for some 0 < a < 
oo. 

(A3) The maximal number of neighbors in the 
DAG Gn is denoted by 

qn= max |adj(G,j)| 

with Qn = for some 0 < 6 < 1 . 

(A4) The partial correlations between X(i) and 
X(j) given {X(r); r G k} for some set k C { 1 ,... ,p„}\ 
{i,j} are denoted by Pn-,i,j\]i- Their absolute values 
are bounded from below and above: 

inf{|pjj|kl;LJ,k with Pjj|k / 0} > c„, 

Cn^ = 0{n^), 
for some 0 < d < 6 / 2 , 
sup \Pi,j\k\ < M < 1, 

n;ij,k 

where 0 < 6 < 1 is as in (A3). 

We will refer to the assumption that all nonzero 
partial correlations are bounded below in absolute 
value by a number greater than zero [as in the first 
part of (A4)] as the Strong Gausal Faithfulness as¬ 
sumption. Uhler et al. (2013) provide some reason to 
believe that unless c„ is quite small, the probability 
of violating Strong Causal Faithfulness assumption 
is high, especially when the number of variables is 
large. [This problem with assumption (A4) is some¬ 
what mitigated by the fact that the size of Cn can de¬ 
crease with increasing sample size. But see Lin et al. 
( 2012 ), for an interesting analysis of the asymptotics 
when Cn approaches zero.] 

It is difhcnlt to see how a uniformly consistent 
estimator of a causal pattern would be possible 
without assuming something like the Strong Causal 
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Faithfulness assumption. However, what we will 
show is that it is possible to weaken the Strong 
Causal Faithfulness assumption in several ways as 
long as the standard of success is not finding a uni¬ 
formly consistent estimator of the causal pattern, 
but is instead finding a uniformly consistent estima¬ 
tor of (some of) the structural coefficients in a pat¬ 
tern. The latter standard is compatible with missing 
some edges that are in the true causal graph, as long 
as the edges that have not been included in the out¬ 
put have sufficiently small structural coefficients. 

We propose to replace the faithfulness assumption 
in (Al), and the Strong Faithfulness assumption 
with the following assumption, where eM{X — Z), 
as we explained in Section 2, denotes the structural 
coefficient associated with the edge between X and 
Z. 

k- Triangle-Faithfulness assumption. Given a set of 
variables V, suppose the true causal model over V 
is M = {P,G), where P is a Gaussian distribution 
over V, and G is a DAG with vertices V. For any 
three variables X, F, Z that form a triangle in G 
(i.e., each pair of vertices is adjacent), 

1. If F is a noncollider on the path {X, F, Z) , then 
\pm{X,Z\W)\ >kx \eM{X — Z)\ for all W C V 
that do not contain F; and 

2. If F is a collider on the path {X,Y,Z), then 
\pm{X,Z\W)\ >kx \eM{X — Z)\ for all W C V 
that do contain F. 

As k approaches 0, the A:-Triangle-Faithfulness 
assumption approaches the Triangle-Faithfulness 
assumption. For (small) k>0, the fc-Triangle- 
Faithfulness assumption prohibits not only exact 
cancellations of active paths in a triangle, but also 
almost cancellations. 

The A;-Triangle-Faithfulness assumption is a weak¬ 
ening of the Strong Gausal Faithfulness assumption 
in two ways. First, Triangle-Faithfulness is signifi¬ 
cantly weaker than Faithfulness. Second, it does not 
entail a lower limit on the size of nonzero partial 
correlations; it only puts a limit on the size of a 
nonzero partial correlation in relation to the size of 
the structural coefficient of an edge that occurs in a 
triangle. 

The Strong Gausal Faithfulness assumption en¬ 
tails that there are no very small structural coef¬ 
ficients (which, if present, entail the existence of 
some partial correlation that is very small). In con¬ 
trast, the /c-Triangle-Faithfulness assumption does 
not entail that there are no nonzero but very small 


structural coefficients. However, there is a price to 
be paid for weakening the Strong Gausal Faithful¬ 
ness assumption; the estimator we propose is both 
computationally more intensive than the PG algo¬ 
rithm used in Kalisch and Biihlmann (2007) and 
also requires testing partial correlations conditional 
on larger sets of variables, which means some of the 
tests performed have lower power than the tests per¬ 
formed in the PG algorithm. 

Our results also depend on the following assump¬ 
tions. First, we assume a fixed upper bound to the 
size of the set of variables that does not change as 
sample size increases. We have no reason to think 
that there are not analogous results that would hold 
even if, as in Kalisch and Biihlmann (2007), the 
number of variables and the degree of the graph in¬ 
creased with the sample size; however, we have not 
proved any such results yet. We also make the as¬ 
sumption of nonvanishing variance (NVV) and the 
assumption of upper bound for partial correlations 
(UBG); 

Assumption NVV{J). 

inf varM(X,|V\{W})> J 

XiGV 

for some (small) J > 0. 

Assumption UBC{C). 

sup \pM{XuXj\W)\<C 

Ai,A,eV,WCV\{Ai,AH 

for some C < 1. 

The assumption NVV is a slight strengthening 
of the positivity requirement, which, as we noted 
in the previous section, is needed to guarantee the 
Gausal Minimality assumption. Uniform consistency 
requires that the distributions be bounded away 
from nonpositivity. 

The assumption UBG [cf. the second part of as¬ 
sumption (A4)] is used to guarantee that sample 
partial correlations are uniformly consistent estima¬ 
tors of population partial correlations (Kalisch and 
Biihlmann (2007)). 

We now proceed to establish two positive results 
about uniform consistency. In Section 4.1 we show 
that the Conservative SGS (CSGS) algorithm, us¬ 
ing uniformly consistent tests of partial correlations, 
is uniformly consistent in inferring certain features 
of the causal structure. In Section 4.2 we show that 
the Very Conservative SCS (VCSGS) algorithm, 
when combined with a uniformly consistent proce¬ 
dure for estimating structural coefficients, provides 
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a uniformly consistent estimator of structural coef¬ 
ficients (that returns “Unknown” in some, but not 
all cases). 

4.1 Uniform Consistency in the Inference of 
Structure 

Recall that the CSGS algorithm, given a perfect 
oracle of conditional independence, is correct un¬ 
der the Causal Markov, Minimality and Triangle- 
Faithfulness assumptions, in the sense that the ad¬ 
jacencies, orientations and marked noncolliders in 
the output are all correct. In Gaussian models, we 
can implement the oracle with tests of zero partial 
correlations. A test ip of Hq : p = 0 versus FIi: p / 0 
is a family of functions: (pi,..., (p„,..., one for each 
sample size, that takes an i.i.d. sample Vn from the 
joint distribution over V and returns 0 (acceptance 
of Hq) or 1 (rejection of Hq). Such a test is uniformly 
consistent with respect to a set of distributions 17 if 
and only if 

1 . lim„^ooSuppgQ;,^p(p)=o^''(<^n(I4) = 1) = 0 , and 

2 . for every <5 > 0, 

lim sup p"-((p„(I 4 ) = 0 ) = 0 . 

For simplicity, we assume the variables in V are 
standardized. Under the assumption UBC, there 
are uniformly consistent tests of partial correla¬ 
tions based on sample partial correlations, such as 
Fisher’s z test (Robins et al. (2003); Kalisch and 
Biihlmann (2007)). We consider a sample version of 
the CSGS algorithm in which the oracle is replaced 
by uniformly consistent tests of zero partial corre¬ 
lations in the adjacency step S2. In the orientation 
phase, the step S3* is refined as follows, based on a 
user chosen parameter L. 

S3* (sample version). Let K be the undirected 
graph resulting from the adjacency phase. For each 
unshielded triple {X, Y, Z ), 

1. If there is a set W not containing Y such that 
the test of p{X,Z\W) = 0 returns 0 (i.e., accepts 
the hypothesis), and for every set U that contains 
F, the test of \p{X^Z\\])\ = 0 returns 1 (i.e., re¬ 
jects the hypothesis), and the test of \p{X,Z\\J) — 
p{X,Z\W)\ > L returns 0 (i.e., accepts the hypothe¬ 
sis), then orient the triple as a collider: A —)• F •(— Z. 

2. If there is a set W containing F such that the 
test of p{X,Z\W) =0 returns 0 (i.e., accepts the 
hypothesis), and for every set U that does not con¬ 
tain F, the test of |/ 9 (A,Z|U)| =0 returns I (i.e.. 


rejects the hypothesis), and the test of \p{X, Z\\J) — 
p(A,Z|W)| > L returns 0 (i.e., accepts the hypoth¬ 
esis), then mark the triple as a noncollider. 

3. Otherwise, mark the triple as ambiguous. 

Larger values of L return “Unknown” more often 
than smaller values of L, but reduce the probability 
of an error in orientation at a given sample size. 

Step S4 remains the same as in the population 
version. 

Given any causal model M = {P, G) over V, 
let C{L,n, M) denote the (random) output of the 
CSGS algorithm with parameter L, given an i.i.d. 
sample of size n from the distribution Pm- Say that 
C{L,n,M) errs if it contains (i) an adjacency not in 
Gm , or (ii) a marked noncollider not in Gm , or (iii) 
an orientation not in Gm-^ 

Let be the set of causal models over V that 

respect the fc-Triangle-Faithfulness assumption and 
the assumptions of NVV( J) and UBG(G). We shall 
prove that given the causal sufficiency of the mea¬ 
sured variables V and the causal Markov assump¬ 
tion, 

lim sup P^(C(L,n,M) errs) = 0. 

In other words, given the causal sufficiency of V, the 
Gausal Markov, fc-Triangle-Faithfulness, NVV(J) 
and UBG(G) assumptions, the CSGS algorithm is 
uniformly consistent in that the probability of it 
making a mistake uniformly converges to zero in the 
large sample limit. 

First of all, we prove a useful lemma: 

Lemma 2. Let M G _ Por any Xi and Xj 
such that Xj is not an ancestor of Xi, if cm {Xi 
Xj) = bjj, then 

M j|X[l,..., j - 1] \ {W})| > \hjj\^, 

where X[l,..., j] is an ancestral set that contains Xi 
but does not contain any descendant of Xj. 

Proof. Let S be the correlation matrix for the 
set of variables {Xi,... ,Xj}, and R = Let B 


^Note that at this stage we are taking non-adjacencies as 
uninformative, and not counting any missing edge as an er¬ 
ror. So an algorithm that always returns a structure with no 
edges is treated as totally uninformative and hence trivially 
consistent, in the sense of triviality defined in Robins et al. 
(2003). The CSGS algorithm is obviously nontrivial in that it 
does not always return a completely uninformative answer. 
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be the (lower-triangular) matrix of structural coef¬ 
ficients in M restricted to {^i,... ,Xj}, and var(E) 
be the (diagonal) covariance matrix for the error 
terms {ei ,... ,ej}. Then 

R= (I-B)'^var(E)~^(I-B). 


Note that 


(I-B) 


■ 1 0 

—62,1 1 

- -hi ■ • • 
■l/ei 0 
0 1/82 




O' 

0 

0 

1 . 


var(E) ^ 


0 ■ 
0 

0 


.0 ••• 0 l/sj. 

where the b's are the corresponding structural co¬ 
efficients in M, and the e’s are the variances of the 
corresponding error terms. Thus, R[j, j] = l/cj, and 
R[i, j] = —bj^i/sj. So we have (Whittaker, 1990) 


^ _ R[ui] ^ bj^i 

Since R[i,f]“^ is the variance of Xi conditional on 
all of the other variables in {Xi, ... ,Xj}, which is a 
subset of V \ {Xi\, R[i, > varM(WlV \ {Xi}) > 
J. Since the variables are standardized and the 
residual of Xi regressed on the other variables is 
uncorrelated with Xi, R[i,i]“^ < 1. Similarly, 1 > 
£j > J . Thus, 

M > |pm(u j|X[l,..., j - 1] \ {W})| > |6 ,v/|Vj. 

vJ □ 


We now categorize the mistakes C{L,n,M) can 
make into three kinds. C{L,n, M) errs in kind I if 
C{L,n, M) has an adjacency that is not present in 
Gm', C{L,n,M) errs in kind II if every adjacency 
in C{L,n, M) is in Gm but G{L,n, M) contains a 
marked noncollider that is not in Gm', G{L,n,M) 
errs in kind III if every adjacency in G{L,n,M) is 
in Gm, every marked noncollider in G{L,n, M) is in 
Gm, but C{L,n,M) contains an orientation that is 
not in Gm- Obviously if C{L,n, M) errs, it errs in 
at least one of the three kinds. 

The following three lemmas show that for each 
kind, the probability of C{L,n,M) erring in that 
kind uniformly converges to zero. 


Lemma 3. Given eausal suffieiency of the mea¬ 
sured variables V, the Causal Markov, k-Triangle- 
Faithfulness, NVV(J) and UBC(C) assumptions, 

lim sup PM{C{L,n, M) errs in kind T) 

= 0 . 

Proof. G{L, n, M) has an adjacency not in Gm 
only if some test of zero partial correlation falsely re¬ 
jects its null hypothesis. Since uniformly consistent 
tests are used in CSGS, for every e > 0, for every 
test of zero partial correlation ti, there is a sample 
size Ni such that for all n> Ni the supremum (over 
of the probability of the test falsely reject¬ 
ing its null hypothesis is less than e. Given V, there 
are only finitely many possible tests of zero partial 
correlations. Thus, for every e > 0, there is a sam¬ 
ple size N such that for all n > N, the supremum 
(over of the probability of any of the tests 

falsely rejecting its null hypothesis is less than e. 
The lemma then follows. □ 

Lemma 4. Given eausal suffieiency of the mea¬ 
sured variables V, the Causal Markov, k-Triangle- 
Faithfulness, and NVV(J) and UBC(C) assump¬ 
tions, 

lim sup Pfj{C{L,n, M) errs in kind II) = 0. 

Proof. For any M S , if C{L,n,M) errs 
in kind II, then C{L,n, M) contains a marked non- 
collider, say, {X, Y, Z) which is not in Gm, but every 
adjacency in G{L, n, M) is also in Gm, including the 
adjacency between X and Y, and that between Y 
and Z. It follows that {X, Y, Z) is a collider in Gm ■ 
Since CSGS marks a triple as a noncollider only if 
the triple is unshielded, X and Z are not adjacent in 
G{L,n,M). Hence, errors of kind II can be further 
categorized into two cases: (11.1) C{L,n, M) con¬ 
tains an unshielded noncollider that is an unshielded 
collider in Gm, and (11. 2 ) G{L,n,M) contains an 
unshielded noncollider that is a shielded collider in 
Gm- We show that the probability of either case 
uniformly converges to zero. 

For case (ILl) there is an unshielded collider 
{X, Y, Z) in Gm, so X and Z are independent condi¬ 
tional on some set of variables W that does not con¬ 
tain Y, by the Causal Markov assumption. Then the 
CSGS algorithm (falsely) marks {X, Y, Z) as a non- 
collider only if the test of pm{X, Z\W) = 0 (falsely) 
rejects its null hypothesis. Therefore, the CSGS al¬ 
gorithm gives rise to case (ILl) only if some test 
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of zero partial correlation falsely rejects its null hy¬ 
pothesis. Then, by essentially the same argument as 
the one used in proving Lemma 3, the probability 
of case (II. 1) uniformly converges to zero as sample 
size increases. 

For case (II.2), suppose for the sake of contradic¬ 
tion that the probability of CSGS making such a 
mistake does not uniformly converge to zero. Then 
there exists e > 0 , such that for every sample size 
n, there is a model M(n) such that the probability 
of C{L,n,M{n)) contains an unshielded noncollider 
that is a shielded collider in M{n) is greater than e. 

Now, C{L,n,M(n)) contains an unshielded non- 
collider that is a shielded collider in GM{n)i say 

if there is a set 

that contains Y such that the test of 
^Af(n)^|-yyM(n)^ _g Q accepts the hy¬ 

pothesis). 

Without loss of generality, suppose is 

not an ancestor of Let \ 

an ancestral set 

that contains and but no descen- 

dent of Since is a child of 

in GM{n)i does not contain Then, 

^g j^arked as a noncollider 
in C{L,n,M{n)) only if the test of \p{X^^'^\ 
|U^W) - > L returns 0 

(i.e., accepts the hypothesis). 

The test of - p{X^^^\ 

ZA/'(n)|’y^M(n)^| > ^ denoted by Pn{L) 

Pn{ 0 ) denotes the test of p{X^^^\ = 

0. By our supposition, (<^n(o) = 0 and p^iL) = 

0) > e. It follows that for all n, 

(^) ^M{n) i‘Gn(0) = 0 ) > ^) 

( 2 ) PM{n)('^n{L) = 0 ) > 6 . 

( 1 ) implies that there exists 6n such that \p(X^^'^\ 
Z^(»^)|’VV^(")| < 6n, and —)■ 0 as re —)■ oo since 
the tests are uniformly consistent. \eM{X^^'^'^ — 

ZAf(n))| < ^ ^y 

fc-Triangle-Faithfulness. By Lemma 2, \p{X^^'^\ 

yM(n)|uM(n)^| < j- 1 / 2 |g^^j^M(n) _ yM(n))| ^ 

SnJ-^^Vk. 

Thus, |p(X^W,Z^WlU^^*"))-p(X^(’"), 
W^("))| < 5n{l + 0 as re —^ oo. There¬ 

fore, it is not true that ( 2 ) holds for all re, which is 
a contradiction. So the initial supposition is false. 
The probability of case (11.2) uniformly converges 
to zero as sample size increases. □ 


Lemma 5. Given causal sufficiency of the mea¬ 
sured variables V, the Gausal Markov, k-Triangle- 
Faithfulness, NVV(J) and UBG(G) assumptions, 

lim sup Pf^{C{L,n,M) errs in kind III) = 0. 

Proof. Given that all the adjacencies and 
marked noncolliders in C{L,n, M) are correct, there 
is a mistaken orientation if and only if there is an 
unshielded collider in C{L,n,M) which is not a col¬ 
lider in Gm j for the other orientation rules in step S4 
would not lead to any mistaken orientation if all the 
unshielded colliders were correct. Thus, G{L,n,M) 
errs in kind III only if there is a noncollider {X, Y, Z) 
in Gm that is marked as an unshielded collider in 
G{L,n,M). 

There are then two cases to consider: (III.l) 
G{L,n,M) contains an unshielded collider that 
is an unshielded noncollider in Gm^ and (III.2) 
G{L,n,M) contains an unshielded collider that is a 
shielded noncollider in Gm- The argument for case 
(III.l) is extremely similar to that for (11.1) in the 
proof of Lemma 4, and the argument for case (III-2) 
is extremely similar to that for (11.2) in the proof of 
Lemma 4. □ 

Theorem 1. Given causal sufficiency of the 
measured variables V, the Gausal Markov, k-Tri- 
angle-Faithfulness, NVV(J) and UBG(G) assump¬ 
tions, the GSGS algorithm is uniformly consistent 
in the sense that 

lim sup Pf)j{C{L,n,M) errs) = 0. 

Proof. It follows from Lemmas 3-5 [and the 
fact that G{L,n, M) errs if and only if it errs in one 
of the three kinds]. □ 

4.2 Uniform Consistency in the Inference of 
Structural CoefFicients 

We now combine the structure search with esti¬ 
mation of structural coefficients, when possible. 

Edge Estimation algorithm. 

El. Run the GSGS algorithm on an i.i.d. sample of 
size re from Pm- 

E2. Let the output from El be C{L,n,M). Apply 
step V5 in the VGSGS algorithm (from Sec¬ 
tion 3), using tests of zero partial correlations. 
E3. If the nonadjacencies in C{L,n, M) are not con¬ 
firmed in E2, return “Unknown” for every pair 
of variables. 
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E4. If the nonadjacencies in C{L,n,M) are con¬ 
firmed in E2, then 

(i) For every nonadjacent pair {X,Y), let the 
estimate e{X — E) be 0. 

(ii) For each vertex Z such that all of the edges 
containing Z are oriented in C{L,n,M), if 
E is a parent of Z in C{L,n,M), let the 
estimate e(E — Z) be the sample regres¬ 
sion coefficient of E in the regression of Z 
on its parents in C{L,n,M). 

(iii) For any of the remaining edges, return 
“Unknown.” 

The basic idea is that we first run the Very Conser¬ 
vative SGS ( VCSGS) algorithm, which, recall, is the 
CSGS algorithm (El) plus a step of testing whether 
the output satishes the Markov condition (E2). If 
the test does not pass, we do not estimate any edge; 
if the test passes, we estimate those edges that are 
into a vertex that is not part of any unoriented edge. 

Let Ml be an output of the Edge Estimation al¬ 
gorithm, and M 2 be a causal model. We define the 
structural coefficient distance, d[Mi,M 2 ], between 
Ml and M 2 to be 

d[Mi,M2] 

= max|eMi(Vj Xj) - eM2{Xi Vj)|, 

where by convention Icmi (Vi —)• V,) — CMoiXi —^ 
Vj)| = 0 if eMi (Xi ^Xj) = “Unknown.” 

Intuitively, the structural coefficient distance be¬ 
tween the output and the true causal model mea¬ 
sures the (largest) estimation error the Edge Esti¬ 
mation algorithm makes. Our goal is to show that 
under the specified assumptions, the Edge Estima¬ 
tion algorithm is uniformly consistent, in the sense 
that for every 6 > 0, the probability of the structural 
coefficient distance between the output and the true 
model being greater than 6 uniformly converges to 
zero. 

Obviously, by placing no penalty on the uninfor¬ 
mative answer of “Unknown,” there is a trivial al¬ 
gorithm that is uniformly consistent, namely, the 
algorithm that always returns “Unknown” for every 
structural coefficient. For this reason, Robins et al. 
(2003) also requires any admissible algorithm to be 
nontrivial in the sense that it returns an informative 
answer (in the large sample limit) for some possible 
joint distributions. The Edge Estimation algorithm 
is clearly nontrivial in this sense. There is no guaran¬ 
tee that it will always output an informative answer 


for some structural coefficient, and rightly so, be¬ 
cause there are cases—for example, when the true 
causal graph is a complete one and there is no prior 
information about the causal order—in which ev¬ 
ery structural coefficient is truly underdetermined 
or unidentifiable. An interesting question, however, 
is whether a given algorithm is maximally informa¬ 
tive or complete in the sense that it returns (in the 
large sample limit) “Unknown” only on those struc¬ 
tural coefficients that are truly underdetermined. 
The condition in question is of course much stronger 
than Robins et ah’s condition of nontriviality. We 
suspect that the Edge Estimation algorithm is not 
maximally informative in this sense. (We thank an 
anonymous referee for raising this issue.) 

Theorem 2. Given causal sufficiency of the 
measured variables V, the Gausal Markov, k-Triangle- 
Faithfulness, NVV(J) and UBG(G) assumptions, 
the Edge Estimation algorithm is uniformly consis¬ 
tent in the sense that for every 6 > 0 

lim sup P^{d[0{M), M] > 6) = 0, 

where 0{M) is the output of the algorithm given an 
i.i.d. sample from Pm- 

Proof. Let O be the set of possible graphical 
outputs of the GSGS algorithm. Given V, there are 
only finitely many graphs in O. So it suffices to show 
that for each O £ O, 

lim sup Pf^{d[0{M),M]> 6\ 

C{L,n,M) = 0) 

•Pl^(C(L,n,M) = O) = 0. 

Given O, can be partitioned into the following 

three sets: 

Ti = {M|A11 adjacencies, nonadjacencies and 

orientations in O are true of M}; 

^2 = {M\0 contains an adjacency or an 
orientation not true of M}; 

Ts = {M|A11 adjacencies and orientations in O are 
true of M, but some nonadjacencies are 

not true of M}. 

It suffices to show that for each Tj, 
lim sup PM{d[0{M),M]> d\C{L,n,M) = 0) 


•P^(G(L,n,M) = O) = 0. 
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Consider d/i first. Given any M G 'I'l, the zero es¬ 
timates in 0{M) are all correct (since all nonadja¬ 
cencies are true). For each edge Y ^ Z that is es¬ 
timated, the true structural coefficient eM{Y —)• Z) 
is simply Parents(0, Z)), the population 

regression coefficient for Y when Z is regressed on 
its parents in O, because the set of Z’s parents in O 
is the same as the set of Z’s parents in Gm- 

The sampling distribution of the estimate of an 
edge X —)• y in O is given by 

VMiXi Z, Parents(0, Z),n) 

^ Z, Parents(0, Z)), 

__j, 

n var(y|Parents(0, Z) \ {P })) ’ 

where Ug is the variance of the residual for Z 
when regressed upon Parents(0,Z) in Pm, and 
var(y|Parents(0, Z) \ {y}) is the variance of Y 
conditional on Parents(0, Z) \y in Pm (Whittaker, 
1990). The numerator of the variance is bounded 
above by 1, since the variance of each variable is 1, 
and the residual is independent of the set of variables 
regressed on. The denominator is bounded away 
from zero by assumption NVV(J). Hence, sample 
regression coefficients are uniformly consistent es¬ 
timators of population regression coefficients under 
our assumptions, and we have 

lim sup PM{d[0{M), M] > 6\C{L,n, M) = O) 

rn-oo^g^l 

•P^(C(L,n,M) = 0 ) 

< lim sup P^(d[0(M),M] > (5| 

n^oo M^-tl}^ 

C{L,n,M) = 0) 

= 0 . 

For 'I' 2 , note that given any M G T 2 , the CSGS al¬ 
gorithm errs if it outputs O. Thus, by Theorem 1, 

lim sup PM{d[0{M),M]> 6\G{L,n,M) = 0) 

•P^(C(L,n,M) = 0 ) 

< lim sup PM{G{L,n,M) = O) = 0. 
n^oo MeiP2 

Now consider 'I' 3 . Let 0{M) be the population 
version of 0{M), that is, all the sample regression 
coefficients in 0{M) are replaced by the correspond¬ 
ing population coefficients. Since sample regression 


coefficients are uniformly consistent estimators of 
population regression coefficients under our assump¬ 
tions, and there are only finitely many regression 
coefficients to consider, for every e > 0 , there is a 
sample size Ni, such that for all n > W, and all 
M G T3, 

Plj{d[d{M),0{M)] > 6/2\G{L, n, M) = 0)< e. 

For any M €^ 3 , there are some edges in Gm miss¬ 
ing in O. Let E(M) be the set of edges missing in 
O. Let M' be the same as M except that the struc¬ 
tural coefficients associated with the edges in E(M) 
are set to zero. Let 0{M') be the same as 0{M) 
except that for each edge with an identified coef¬ 
ficient, the coefficient in 0{M') is the relevant re¬ 
gression coefficient derived from P'^ [whereas that 
in 0{M) is derived from Pm]- By the setup of M', 
the identified edge coefficients in 0{M') are equal 
to the corresponding edge coefficients in M', which 
are the same as the corresponding edge coefficients 
in M. Thus, the structural coefficient distance be¬ 
tween 0{M') and M is simply 

d[0{M'),M]= max \eM{Xi^XP\. 

(*J>6E(M) 

For any edge y —)• Z in O that has a different 
edge coefficient in 0{M) than that in 0{M'), the 
edge coefficients are both derived from a regression 
of Z on Parents(0, Z), but one is based on Pm 
and the other is based on Pm>- The regression co¬ 
efficient r(y, Z, Parents(0, Z)) is equal to the Y 
component of the vector cov(Z, Parents(0, Z)) x 
var“^(Parents(0, Z)) (Whittaker, 1990), which, 
given the structure Gm, is a rational function 
of the structural coefficients in M. Since M G 
^k,j,c^ every submatrix of the covariance matrix for 
Pm is invertible, and so rM(T, Z, Parents(0, Z)) 
is defined. For M', rjvf'(y, Z, Parents(0, Z)) = 
rM'{Y, Z,A), where A is the smallest ancestral set 
that contains Parents(0,Z) in Gm- var(A)~^ = 
(I — B)^ var(E)”^(I — B), where B is the submatrix 
of structural coefficients in M' for variables in A, 
and var(E) is the diagonal covariance matrix of er¬ 
ror terms for variables in A, which is a submatrix of 
Sm- Since M G , the variance of every error 

term is bounded from below by J. Thus, var(A)“^ 
is defined and so is rM'(y, Z, Parents(0, Z)). 
Therefore, rM(y, Z, Parents(0, Z)) and rM'{Y,Z, 
Parents(0, Z)) are values of a rational function of 
the structural coefficients. 
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A continuous function is uniformly continuous on 
a closed, bounded interval anywhere that it is de¬ 
fined. A rational function is continuous at every 
point of its domain where its denominator is not 
zero, that is, where the function value is dehned. By 
Lemma 2 and assumption UBC(C), every structural 
coefficient bj^i in M lies in the closed bounded inter¬ 
val from to Cj Obviously the coeffi¬ 

cients in M' still lie in this interval. Hence, given 
Gm, the difference between Z, Parents(0, 

Z)) and Z, Parents(0, Z)) can be arbitrarily 

small if the differences between the structural coef- 
hcients in M' and those in M are sufficiently small. 
Given the set of variables V, there are only hnitely 
many structures and finitely many relevant regres¬ 
sions to consider. Therefore, there is a 7 G (0,(5/4) 
such that for every M G i/'s,(i[0(M), 0(M')] < <5/4 
if 

max \eM{Xi Xj)\ < 7. 

(i,i)GE{M) 

Consider then the partition of Ts into 

4 ' 3 .i = |mgT 3 | max \eMiXi ^ Xj)\ < 
and 

^ 3.2 = \m€^ 3 \ max \eM{Xi ^ Xj)\ >-f\. 
t hj>6E(M) J 

It follows from the previous argument that for 
every M G ^ 3 . 1 ,d[0(M),M] < d[0(M), 0(M')] + 
d[0{M'),M] < (5/4-I-7 < 6/2. Then there is a sample 
size A^i, such that for all n > Ni and all M G Ts.i, 

P^id[diXI),M]> 6 \C{L,n,M) = 0) 
<P^id[diM),OiM)]>6/2\C{L,n,X[)=0) 

< e. 

For every M G 'L 3 . 2 , there is at least one edge, 
say, X ^ Y missing from O such that \eM{X — 
H)| > 7 . Then by Lemma 2, there is a set U 
such that |/o(A,T|U)| > but O entails that 

/ 9 (X, T|U) =0. Thus, the test of the Markov con¬ 
dition in step E2 is passed only if the test of 
p{XXX) =0 returns 0 (i.e., accepts the null hy¬ 
pothesis). Note that if the test is not passed, then 
every structural coefficient is “Unknown,” and so by 
dehnition the structural coefficient distance is zero. 
Therefore, the distance is greater than 6 (and so 
nonzero) only if the test of p{X,YX) = 0 returns 0 
while |/3(A,y|U)| > Since tests are uniformly 


consistent, it follows that there is a sample size N 2 , 
such that for all n> N 2 and all M G ^' 3 . 2 , 

Plj{d[d{M),M]> 6 \C{L,n,M) = 0) <e. 

Let N = max(A^i,A^ 2 )- Then for all n> N, 

sup Pj/j{d[d{M),M]> 6 \C{L,n,M) = 0) 

Mei/>3 

■PJij{C{L,n,M) = 0) 

< sup PJij{d[d{M),M]> 6 \C{L,n,M) = 0) 
<e. □ 

5. CONCLUSION 

We have shown that there is a pointwise con¬ 
sistent estimator of causal patterns and a uni¬ 
formly consistent estimator of some of the struc¬ 
tural coefficients in causal patterns, even when the 
Causal Faithfulness assumption and Strong Causal 
Faithfulness assumptions are substantially weak¬ 
ened. The A:-Triangle Faithfulness assumption is a 
restriction on many fewer partial correlations than 
the Causal Faithfulness assumption and the Strong 
Causal Faithfulness assumptions, and does not en¬ 
tail that there are no edges with very small but 
nonzero structural coefficients. 

There are a number of open problems associated 
with the Causal Faithfulness assumption: 

1. Is it possible to speed up the Very Conservative 
SGS algorithm to make it applicable to data sets 
with large numbers of variables? 

2. If unfaithfulness is detected, is it possible to 
reduce the number of structural coefficients where 
the algorithm returns “Unknown?” 

3. In practice, on realistic sample sizes, how does 
the Very Conservative SCS algorithm perform? 
[Ramsey, Zhang and Spirtes (2006), have already 
shown that the Conservative PC algorithm is more 
accurate and not significantly slower than the PC 
algorithm]. 

4. Is the fc-Triangle Faithfulness assumption un¬ 
likely to hold for reasonable values of k and large 
numbers of variables? 

5. Is there an assnmption weaker than the k- 
Triangle Faithfulness assumption for which there is 
a uniformly consistent estimator for structural coef¬ 
ficients in a causal pattern? 

6 . Are there analogons results that apply when 
the number of variables and the maximnm degree 
of a vertex increases and the size of k decreases 
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with increasing sample size [as in the Kalisch and 
Biihlmann (2007), results]? 

7. Are there analogous results that apply when 
the assumption of causal sufficiency is abandoned? 

8. Are there analogous results that apply for other 
families of distributions or for nonparametric tests 
of conditional independence? 
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